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ABSTRACT 


Several  alternate  sets  of  parameters  that  represent  the 
linear  predictor  are  investigated  as  transmission  parameters 
for  linear  predictive  speech  compression  systems.  Althouqh 
each  of  these  sets  provides  equivalent  information  about  the 
linear  predictor,  their  properties  under  quantization  are 
different.  The  results  of  a  comparative  studv  of  the 
various  parameter  sets  are  reported.  Specifically  it  is 
concluded  that  the  reflection  coefficients  are  the  best  set 
for  use  as  transmission  parameters.  A  more  detailed 
investiqation  of  the  quantization  properties  of  the 
reflection  coefficients  is  then  carried  out  usinq  a  spectral 
sensitivitv  measure.  A  method  of  optimally  quantizinq  tno 
reflection  coefficients  is  also  derived.  Using  this  method 
it  is  demonstrated  that  logarithms  of  the  ratios  of  the 
familiar  area  functions  possess  approximately  optimal 
quantization  properties.  Also,  a  solution  to  the  problem  of 
bit  allocation  among  the  various  parameters  is  presented, 
based  on  the  sensitivitv  measure. 

The  use  of  another  spectral  sensitivity  measure  renders 
logarithms  of  the  ratios  of  normalized  errors  associated 
with  linear  predictors  of  successive  orders  as  the  optimal 
quantization  parameters.  Informal  listening  tests  indicate 
that  the  use  of  log  area  ratios  for  quantization  leads  to 
bettor  synthesis  than  the  use  of  log  error  ratios. 
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I.  INTRODUCTION 


In  recent  years  the  method  of  linear  prediction  has 
been  cuite  successfully  used  in  speech  compression  systems 
[1]  -  15] .  In  this  method,  speech  is  modeled  by  an  all-pole 
filter  H (z)  as  shown  in  Fiq.  1.  The  input  to  the  filter  is 
either  a  sequence  of  pulses  separated  by  the  pitch  period 
for  voiced  sounds,  or  white  noise  for  fricated  (or  unvoiced) 
sounds.  The  parameters  a^  liksp,  are  known  as  the 
predictor  coefficients,  and  G  is  the  filter  gain.  For  a 
particular  speech  segment  the  filter  parameters  are  obtained 
by  passincr  the  speech  signal  through  the  inverse  filter  A(z) 
(as  in  Fig.  2)  and  then  minimizing  the  total-squared 

prediction  error 


E 


I  e* 

n  n 


P  , 

2(s  +  2  ak  sn-k 
n  n  k=i  K  n  * 


(1) 


with  respect  to  ^  If  the  signal  sn  is  assumed  to  be  zero 
for  n<  0  and  n>N  (e.g.  by  multiplying  it  by  a  finite 
window) ,  the  error  minimization  results  in  the  set  of 
equations 
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(b)  TIME-DOMAIN  MODEL 


Fig.  1.  Discrete  model  of  speech  production  as  emoloved  m 
linear  prediction. 
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The  error  sequence  e  as  the  output  of  an  inverse 
filter  A (z) .  n 
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where 


R. 
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N-|i| 

*  sn  sn+ | i I 

n=0 


(3) 


is  the  autocorrelation  function  of 
of  equations  (2)  can  be  recursively 
coefficients  a^  as  follows: 


the  signal  s  .  The  set 
n 

solved  for  the  predictor 


(4-a) 
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Equations  (4-b,c,d)  are  solved  recursively  for  i«l,2, . . . »p. 

The  final  solution  is  qiven  by 

_  _(p)  l<i<p  •  ( 4 — e ) 

a.  -  a.  ,  -J-F 

The  filter  H(z)  with  the  predictor  coefficients  obtained 
from  (4)  is  always  stable,  i.e.  the  poles  of  H(z)  lie 
inside  the  unit  circle  in  the  z-plane.  Since  H(z)  is  an 
all-pole  filter,  stability  also  implies  that  H(z)  is  minimum 
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phase. 

The  intermediate  quantities  k^,  1-i-P#  in  (4)  are 

called  the  reflection  coefficients  (or  partial  correlation 
coefficients  [3,10]).  Reflection  coefficients  occur 
naturally  in  the  treatment  of  the  vocal  tract  as  an  acoustic 
tube  with  p  sections,  each  with  a  different  cross-sectional 
area  [2,9] .  An  important  result  that  will  be  used  in  the 
sequel  is  that  the  conditions 

-l<k.<l  ,  lsi^P  *  (5) 


are  both  necessary  and  sufficient  for  the  stability  of  H(z). 

The  quantity  F,p  obtained  from  (4)  is  the  minimum  value 
of  the  prediction  error  given  in  (1) .  By  expanding  the 
squared  terms  in  (1)  and  using  (2),  it  can  be  shown  that  the 
minimum  error  is  given  by 


R  + 

o 


P 

l 

k=l 


R, 


(6) 


Of  interest  also  is  the  normalized  error  which  is  the 

ratio  of  the  minimum  error  to  the  energy  of  the  input  speech 


signal,  i.e. 


(7) 
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From  (4-a),  (4-d)  and  (7)  we  obtain 


V 

P 


P  2 

n  (!-*•> 

j=l  3 


(8) 


The  gain  G  of  the  filter  H(z)  is  obtained  by  conserving 
the  total  energy  between  the  speech  signal  and  the  impulse 
response  of  H(z).  The  gain  can  be  shown  to  satisfy  16] 


=  R  V  -  R  + 

op  o 


p 

k=A 


ak  Rk 


(9) 


Equations  (2),  (3)  and  (9)  completely  specify  the  filter 
parameters.  It  can  be  shown  that  (for  a  well  chosen  p)  the 
resulting  linear  prediction  all-pole  spectrum  is  a  good 
match  to  the  envelope  of  the  signal  spectrum  [6] . 


Above  we  assumed  that  the  speech  signal  was  multiplied 
by  a  finite  window.  The  shape  of  window  is  of  importance  if 
the  signal  spectrum  is  to  approximate  the  transfer  function 
of  the  vocal  tract.  This  issue  is  discussed  in  detail 
elsewhere  (7].  A  smooth  window  such  as  the  Hamming  or 
Hanning  window  is  adequate. 

When  applying  the  linear  prediction  method  to  speech 
compression,  the  model  parameters  -  predictor  coefficients, 
gain  and  pitch  frequency  for  voiced  sounds  -  have  to  be 
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extracted,  quantized  and  transmitted  to  the  receiver.  The 
rate  of  such  parameter  extraction  is  usually  on  the  order  o* 
50-100  Hz  to  follow  the  time-varying  overall  characteristics 
of  the  input  speech  signal.  At  the  receiver,  speech  is 
reconstructed  (or  synthesized)  using  the  speech  production 
model  given  in  Fig.  1. 

The  optimal  choice  and  quantization  of  transmission 
parameters  is  of  prime  importance  if  the  resulting 
synthesized  speech  is  to  be  of  good  quality.  In  this  paper, 
several  alternate  sets  of  transmission  parameters  are 
considered  and  their  quantization  properties  are  compared.* 
This  comparative  study  has  indicated  that  the  reflection 
coefficients  possess  many  desirable  quantization  properties. 
An  optimal  method  of  quantizing  the  reflection  coefficients 
is  derived  usino  a  spectral  sensitivity  measure.  The 
sensitivity  measure  is  also  used  for  allocating  a  fixed 
number  of  bits  among  the  various  parameters  in  an  optimal 
manner  (in  a  minimax  sense).  Finally,  the  use  of  a  second 
spectral  sensitivity  measure  for  the  ontimal  quantization  of 
the  reflection  coefficients  is  investigated. 


*As  the  quantization  properties  of  pitch  and  qain  are  well 
understood  we  have  not  considered  them  in  this  study . 
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II.  ALTERNATE  TRANSMISSION  PARAMETER  SETS 


The  all-pole  model  used  in  a  linear  predictive  system 
has  a  transfer  function 


H(z) 


G 

A  ( 5T 


l 

n=0 


h 


n 


(10) 


where  the  inverse  filter  A(z)  is  given  by 


A  (z) 


1  + 


(ID 


Given  below  is  a  list  of  possible  sets  of  parameters  for 
characterizinq  uniquely  the  linear  prediction  filter  H(z): 

(1)  Impulse  response  of  the  inverse  filter  A(z),  i.e. 

predictor  coefficients  an,  l<nsp. 

(2)  Impulse  response  of  the  all-pole  model  hn,  0_n_p, 
which  are  easily  obtained  by  long  division.  Note 
that  the  first  p+1  coefficients  uniquely  specify 
the  filter. 

(3)  Autocorrelation  coefficients  of  (a^/G), 


p-|  l 


77  , 


a  .  a . 


j*0 


j+|il 


a  =1  ,  0<i<p.  (12) 

o 
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(4)  Autocorrelation  coefficients  of  {h^} 


ri  ■  hi  Vm  '  05i5p  ' 


(13) 


It  can  bo  shown  that  r^  is  equal  to  R.  in  (3)  for 
0<i<p  [6,7]  . 

(5)  Spectral  coefficients  o^  A(z)/n,  P^,  Osisp,  (or 

equivalently  spectral  coefficients  of  H(z),  1/P^ ) 


P. 

l 


b 


o 


P 

+  21  b. 

j=l  3 


05ilp  , 


(14) 


where  b^  are  as  defined  in  (12).  In  words,  {!-} 
is  obtained  from  { }  throuqh  a  discrete  Fourier 
transform  (DFrn)  .  Traditionally,  vocoders  that 
transmit  the  spectrum  at  selected  frequencies  have 
been  known  as  channel  vocoders.  Thus,  use  of  the 
spectral  coefficients  as  transmission  parameters 
leads  to  a  linear  prediction  channel  vocoder. 
While  in  the  classical  channel  vocoder  different 
channel  signals  are  derived  from  rontiquous 
band-pass  filters,  in  the  linear  prediction 
channel  vocoder  a  selected  set  ce  p+1  points  from 
the  all-pole  spectrum  constitute  the  "channel 
outputs."  The  main  advantage  of  the  linear 
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prediction  channel  vocoder,  however,  is  that  we 
are  able  to  regenerate  exactly  the  all-pole 
spectrum  from  a  knowledge  of  the  p+1  spectral 
coefficients,  unlike  in  the  classical  channel 
vocoder. 

(6)  Cepstral  coefficients  of  A(z),  c^,  l<n£p,  (or 

equivalently  cepstral  coefficients  of  H(z)/G,  -c  ) 

c  =  i—  /  log  A(e^w)  e^nw  dw 
n  2  tt 


Since  A(z)  is  minimum  phase,  we  obtain  using  the 
results  given  in  [8,  p.  24f] 


c 

n 


a 


n 


n"l  m 
Z  -  c  a 
,  n  m  n-m 
m=l 


l5n5p 


(15) 


(7)  Poles  of  H(z)  (or  equivalently  zeros  of  A(z)). 

(8)  Reflection  coefficients  k^,  lsi5p,  or  simple 
transformations  thereof,  e.g.  area  coefficients 
[2,9 J.  The  area  coefficients  are  given  by 


1+k . 

Ai  =  Ai+1  T=T7 


(16) 


Although  the  reflection  coefficients  are  obtained 
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as  a  bvDroduct  of  the  solution  in  (4),  they  can 
also  be*  computed  directly  from  the  predictor 
coefficients  using  the  following  recursive 

relations : 


ki  =  a<i} 

1  1  '  ; 

a(i)  a(*} 

a(i-i>  .  aj  ~ 

3  1  -  kf 


-(*> 


l5jii-l 


(17) 


where  the  index  i  takes  values  p,  p-1,...,1  m 
that  order.  Initially,  ajp)=  a.,  lsjsp. 

Some  of  the  above  sets  of  parameters  have  p+1 
coefficients  while  others  have  only  p  coefficients. 
However,  for  the  latter  sets  the  signal  energy  (or  gain  6) 
needs  to  be  transmitted,  thus  keeping  the  total  number  of 
parameters  as  p+1  for  all  the  cases.  Although  the  above 
sets  provide  equivalent  information  about  the  linear 
predictor,  their  properties  under  nuantization  are 
different.  Certain  aspects  of  the  sets  (1),  (4),  (7)  and 
(8)  have  been  studied  in  the  past  [2,10].  Our  purpose  in 
this  paper  is  to  investigate  the  relative  quantization 
properties  of  all  these  parameters  with  a  particular 
emphasis  on  the  reflection  coefficients. 
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It  should  be  emnhasized  that  the  predictor  coefficients 
can  be  recovered  from  any  of  the  various  sets  of  parameters 
listed  above.  The  required  transformations  for  such  a 
recovery  are  given  below  only  for  the  sets  (3),  (5),  (6)  and 
(8)  since  thev  are  well-known  for  the  others. 


The  sequence  {b^}  is  transformed  through  an  FFT  after 
appending  it  with  an  appropriate  number  of  zeros  to  achieve 
sufficient  resolution  in  the  resulting  spectrum  of  the 
filter  A(z)/G.  The  spectrum  of  the  all-pole  filter  H(z)  is 
then  obtained  bv  simply  inverting  the  amplitudes  of  the 
computed  spectrum.  Inverse  Fourier  transformation  of  the 
spectrum  of  If  ( z )  yields  autocorrelation  coefficients  {r^} 
defined  in  (13).  The  first  p+1  autocorrelation  coefficients 
r^ ,  0<i<p,  are  then  used  to  compute  the  predictor 

coefficients  via  the  normal  equations  (2)  with  R^=r^,  0<i<p. 

The  predictor  coefficients  are  recovered  from  the 
spectral  coefficients  {Pj_}  bv  first  taking  the  inverse  DFT 
of  the  sequence  {P^}  to  get  the  autocorrelation  sequence 
{bj_}.  The  process  of  getting  the  predictor  coefficients 
from  {b. }  has  been  discussed  above. 

l 

Rearranging  (15)  provides  the  necessary  transformation 
from  cepstral  coefficients  to  predictor  coefficients: 


n-1 

a=c+Z-ca  ,  l5nip  . 

n  n  _ i  n  m  n-m 

m=  i 


(18) 
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Equations  (15)  and  (18)  also  sugqest  the  use  of  the  modified 

A 

cepstral  coefficients  c  =nc  as  possible  transmission 

r  nr. 

parameters . 

The  predictor  coefficients  can  be  recovered  from  the 
reflection  coefficients  usinq  the  relations  (4-c)  with 
i*l,2,...p,  then  (4-e). 
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III.  PREPROCESSING  METHODS 


Before  we  discuss  the  quantization  properties  of  the 
different  parameters  we  should  mention  that  such  properties 
can  be  improved  by  proper  preprocessing,  which  is  later 
undone  at  the  synthesizer.  For  each  set  of  parameters  (1-8 
above)  we  have  observed  that  the  short-time  spectral  dynamic 
range  of  the  speech  signal  is  the  single  most  important 
factor  that  affects  the  quantization  properties.  We  use  two 
methods  of  preprocessing  to  reduce  the  spectral  dynamic 
range  and  thereby  to  improve  the  quantization  properties 
[HI.  in  the  first  method,  optimal  (linear  predictive) 
preemphasis  is  applied  to  the  speech  signal  which  reduces 
the  spectral  dvnamic  range  by  reducing  the  aeneral  spectral 
slope.  The  second  method,  called  the  SIGMA  method,  involves 
multiplying  the  impulse  response  of  the  inverse  fil  er  A(z) 
by  a  decayinc  exponential,  which  increases  the  pole 
bandwidths,  resulting  in  a  reduction  of  the  spectral  dynamic 
range*.  Preprocessing  by  either  of  these  methods  can  be 
done  after  the  linear  prediction  analysis,  so  that  it  can  be 
viewed  as  part  of  the  encoding  process. 


*Tf  however  a  qrowing  exponential  is  used,  the  pole 
track ing  (6,7]. 
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IV.  QUANTIZATION  PROPERTIES 


For  the  purpose  of  quantization,  two  desirable 

properties  for  a  parameter  set  to  have  are:  (a)  filter 

stability  upon  quantization  and  (b)  a  natural  ordering  of 

the  parameters.  Property  (a)  means  that  the  poles  of  H(z) 

continue  to  be  inside  the  unit  circle  even  after  parameter 

quantization.  By  (b)  we  mean  that  the  parameters  exhibit  an 

inherent  ordering,  e.g.  the  predictor  coefficients  are 

ordered  as  a, ,  a  ....,  a  .  If  a  and  a  are  interchanged 
12  p  l  z 

then  H ( z)  is  no  longer  the  same  in  general,  thus 
illustrating  the  existence  of  an  ordering.  When  such  an 
ordering  is  present,  a  statistical  study  on  the  distribution 
of  individual  parameters  can  be  used  to  develop  better 
quantization  schemes.  It  is  clear  that  property  (a)  is  more 
important  than  (b) .  Only  the  poles  and  the  reflection 
coefficients  ensure  stability  upon  quantization,  while  all 
the  sets  of  parameters  except  the  poles  possess  a  natural 
ordering.  Thus,  only  the  reflection  coefficients  possess 
both  of  these  properties. 


We  have  investigated  experimentally  the  quantization 
properties  of  the  sets  of  parameters  discussed  in  Section 
II,  with  and  without  preprocessing  of  the  speech  signal. 
The  absolute  error  between  the  log  pov;er  spectra  of  the 
unquantized  and  the  quantized  linear  predictors  was  used  as 
a  criterion  in  this  study,  since  we  believe  that  a  good 
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spectral  match  is  necessarv  for  synthesi zing  speech  with 
good  quality.  A  summary  of  the  results  is  provided  in  the 
following. 


The  impulse  responses  {a^}  and  {h^}  are  highly 


susceptible 

to  causing 

instability 

of 

the  filter 

upon 

quantization. 

This  is 

well  known 

from 

discre te 

filter 

analysis . 

Positive 

definiteness 

of 

autocorrelation 

coefficients 

{  b^ }  and 

{rL)  is 

not 

ensured 

under 

quantization,  which  also  leads  to  instabilities  in  the 
linear  prediction  filter.  An  attempt  to  synthesize  speech 
with  quantized  autocorrelation  coefficients  {r^}  resulted  in 
distinctly  perceivable  "clicks"  ir  the  synthesized  speech. 
Our  conclusion  is  that  the  impulse  responses  and 
autocorrelation  coefficients  can  be  used  only  under  minimal 
quantization,  in  which  case  the  transmission  rate  would  be 
excessive. 

In  the  experimental  investigation  of  the  spectral  and 
cepstral  parameters,  we  found  that  the  quantization 
properties  of  these  parameters  are  qenerallv  superior  to 
those  of  the  impulse  responses  and  autocorrelation 
coefficients.  The  spectral  parameters  often  yield  results 
comparable  to  those  obtained  by  quantizing  the  reflection 
coefficients.  However,  for  the  cases  when  the  spectrum 
consists  of  one  or  more  very  sharp  peaks  (narrow 
bandwidths) ,  the  effects  of  quantizing  the  spectr'l 
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coefficients  often  cause  certain  regions  in  the 


reconstructed  spectrum  to  become  negative,  which  leads  to 


instability  of  the  filter.  Preprocessing  the  speech  signal 


by  the  SIGMA  method  remedies  this  situation,  but  the 


spectral  deviation  in  these  regions  can  be  relatively  large. 


Quantization  of  cepstral  parameters  can  also  lead  to 


instabilities.  As  before,  with  proper  preprocessing 


stability  is  restored,  but  at  the  expense  of  increased 


spectral  deviation. 


As  mentioned  earlier,  the  stability  of  the  filter  H(z) 


is  guaranteed  under  quantization  of  the  poles.  This  makes 


the  poles  potentially  a  good  set  of  parameters  for 


transmission.  Unfortunately,  the  poles  do  not  possess  a 


natural  ordering:  a  property  that  is  necessary  if  a  low 


transmission  rate  is  desired.  Traditionally,  poles  have 


been  ordered  in  terms  of  vocal  tract  resonances  (formants). 


Since  the  ranges  of  freauencies  for  the  various  formants 


have  been  well  established,  their  quantization  can  be  done 


with  improved  accuracy.  In  addition,  the  formant  bandwidths 


may  be  quantized  less  accurately  than  formant  frequencies, 


which  leads  to  further  savings  in  transmission  rate. 


However,  experience  has  shown  that  the  problem  of 


identifying  the  poles  as  ordered  formants  is  computationally 


complex  and  involve',  a  fair  amount  of  decision  making  which 


is  not  completely  reliable.  In  addition,  computing  the 


poles  requires  finding  the  roots  of  a  pth  order  polynomial 
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(p~12):  not  a  straiqhtforward  task. 

Based  on  the  results  of  our  experimental  study  of  the 
spectral  deviation  due  to  quantization,  on  computational 
considerations,  and  on  stability  and  natural  ordering 
properties ,  we  conclude  that  the  reflection  coefficients  are 
the  best  set  for  use  as  transmission  parameters.  The 
question  now  is,  what  is  an  optimal  quantization  scheme  for 
the  reflection  coefficients  which  gives  the  best  results  in 
terms  of  the  quality  of  the  synthesized  speech?  To  this  end, 
we  perform  in  the  next  section  a  spectral  sensitivilv 
analysis  of  the  reflection  coefficients,  since  we  have 
assumed  that  good  quality  speech  depends  on  an  accurate 
representation  of  the  power  spectrum.  Based  on  the  results 
of  this  study  we  present  in  Section  VI  an  optimal  scheme  for 
the  quantization  of  the  reflection  coefficients. 
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V.  SENSITIVITY  ANALYSIS  OF  REFLECTION  COEFFICIENTS 

In  order  to  understand  the  effects  of  parameter 
quantization  on  the  all-pole  model  spectrum,  we  study  in 
this  section  the  sensitivity  of  the  spectrum  to  small 
changes  in  the  reflection  coefficients.  If  AS  is  the 
spectral  deviation  due  to  a  change  Ak^  in  the  reflection 
coefficient  k^,  then  we  define  the  spectral  sensitivity  for 
the  coefficient  k^  as 


9S  _  Lim  AS 
THT  “  Ak^-*0  Aki 


(19) 


The  definition  of  spectral  deviation  AS  can  be  arbitrary , 
but  for  it  to  be  useful  it  must  somehow  relate  in  a 
proportional  manner  to  the  corresponding  effect  on 
perception  of  the  svnthesized  speech.  Here  we  employ  a 
measure  of  spectral  deviation  that  has  been  found  to  be 
useful  in  spevjch  research,  namely,  the  average  of  the 
absolute  value  of  the  difference  between  the  two  log  spectra 
under  consideration.  Thus  the  spectral  sensitivity  is 
defined  by 


Lim  1  _  1  /  I  log  p  (k . , w)  -  log  P(k.+Ak.,w)  dw  , 

Ak^O  [27  I  i  '  J 
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or 


where 


is  the  spectrum  of  the  all-pole  model  H(z).  The  quantity 
between  brackets  in  (20)  is  the  spectral  deviation  AS  due  to 
a  perturbation  in  the  ith  reflection  coefficient. 
Experimentally,  is  computed  by  replacing  the  integral 
by  a  summation,  and  by  using  a  sufficiently  small  value  for 

A  ^ 

“*V 

Typical  sensitivity  curves  are  shown  in  Fig.  3.  (For 
display  purposes  we  have  plotted  10  loq^  ■5377  decibels.) 
These  curves  were  obtained  from  a  12-oole  linear  predictive 
analysis  of  a  20  msec  frame  from  a  10  kHz  sampled  speech 
signal.  Each  curve  in  Fig.  3  is  a  plot  of  the  spectral 
sensitivity  for  one  of  the  12  reflection  coefficients  as  its 
value  is  varied  over  the  ranqe  (—1,1)  while  the  other  11 
reflection  coefficients  are  kept  constant.  We  have 
performed  this  type  of  sensitivity  analysis  for  a  large 
number  of  different  sounds  recorded  from  different  speakers. 
The  resulting  sensitivity  curves  were  similar  to  those  shown 
in  Fig.  3.  The  sensitivity  curves  have  the  following 
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Fiq.  3.  Typical  spectral  sensitivity  curves  for  the 

reflection  coefficients  of  a  12-pole  analysis 
of  a  20  msec  soeech  frame. 
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properties  in  common : 

(i)  Each  sensitivity  curve  versus  has  the  same 

general  shape  irrespective  of  the  index  i  and 
irrespective  of  the  values  of  the  other 

coefficients  k  ,  n*<i,  at  which  the  sensitivity  is 
computed. 

(ii)  Each  sensitivity  curve  is  U-shaped.  It  is 
even-svmmetric  about  k^=0,  and  has  large  values 
when  the  magnitude  of  k^  is  close  to  1  and  small 
values  when  the  magnitude  of  k^  is  close  to  zero. 


It  has  been  observed  bv  some  researchers  that  the  first 
few  reflection  coefficients  are  the  most  sensitive  to  the 
effects  of  quantization.  While  this  is  true,  it  is  clear 
from  the  results  of  our  sensitivity  analysis  that  the  high 
sensitivity  is  not  due  to  the  fact  that  these  reflection 
coefficients  are  the  leading  ones  but  because  on  the  average 
they  assume  macrnitudes  closer  to  1  than  the  others. 


The  sensitivity  properties  given  above  strongly  suggest 
the  existence  of  a  prototype  sensitivity  function  which 
would  apply  approximately  to  every  reflection  coefficient 
and  for  different  speech  sounds.  Such  a  prototype  function 
could  then  be  used  in  developing  an  optimal  quantization 
scheme  that  would  apply  to  all  reflection  coefficients  all 
the  time.  Due  to  the  above  sensitivity  properties,  it  is 
meaningful  to  obtain  this  prototype  sensitivity  function  as 
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the  simple  averaqe  of  the  sensitivity  curves  over  different 
reflection  coefficients  and  for  a  large  number  of  different 
speech  sounds.  Such  an  averaqed  sensitivity  function  is 
defined  below: 


(Ts 

W 


P 

z 

i=l 


as 

W7 


k.=k 


(21) 


where  t  refers  to  the  number  of  the  analysis  frame  (time 
averaginq) .  The  averaqed  sensitivity  function  for  a 
representative  speech  sample  is  shown  plotted  as  the  solid 
curve  in  Fiq.  4.  In  this  plot  the  sensitivity  values  are 
given  in  decibels  relative  to  the  sensitivity  at  k=0.  In 
the  next  section,  we  develop  an  optimal  quantization  scheme 
for  the  reflection  coefficients  using  the  averaged 
sensitivity  function  in  Fig.  4. 


» 


h 
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Fiq  • 


4.  Averaqed  spectral  sensitivity  curve  for  the 
reflection  coefficients  (solid  line)  and  an 
analytical  function  that  approximates  it 
(dashed  line) . 
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VI.  OPTIMAL  QUANTIZATION  OF  REFLECTION  COEFFICIENTS 

In  view  of  the  sensitivity  properties  of  the  reflection 
coefficients  discussed  in  the  previous  section  and  depicted 
in  Fiqs.  3  and  4,  it  is  clear  that  linear  quantization  of 
the  reflection  coefficients  is  not  satisfactorv ,  especially 
when  some  of  them  take  values  close  to  1  in  magnitude.  What 
is  needed  is  a  nonlinear  quantization  scheme  that  is  much 
more  sensitive  (has  more  steps)  near  ±1  than  near  0.  A 
nonlinear  quantization  of  a  reflection  coefficient  is 
equivalent  to  a  linear  quantization  of  a  different  parameter 
that  is  related  to  the  reflection  coefficient  by  a  nonlinear 
transformation.  We  define  an  optimal  transformation  as  one 
which  results  in  a  transformed  parameter  that  has  a  flat  or 
constant  spectral  sensitivity  behavior.  We  shall  now  use 
the  results  of  the  previous  section  to  determine  this 
optimal  transformation. 

Denoting  the  transformed  parameter  as  gf  wo  have 


where  f(-)  is  the.  underlying  nonlinear  mapning.  The  optimal 
mapping  is  one  where  the  transformed  parameter  g  has 
constant  spectral  sensitivity,  i.e. 


=  L  =  a  constant  , 

Ig 


(23) 
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where  the  sensitivity  is  defined  in  an  analogous  manner  to 
(20) .  Writing  formally, 

3 S  3S  dk  as  / df (k)  04^ 

3g  *  7F  3g  “  W  /  Sr~  * 


Thus,  from  (23)  and  (24)  we  have 


df (k)  1  as 

ST~  ~  L  TJT  * 


(25) 


Equation  (25)  provides  the  condition  for  a  mapping  to  be 
optimal.  The  optimal  mapping  f(k)  is  obtained  by  simply 
integrating  (25)  .  It  is  clear  that  (25)  may  be  applied  to 
each  reflection  coefficient  separately.  However,  for  the 
reasons  mentioned  in  the  last  section  we  shall  consider  the 
averaged  sensitivitv  function  in  Fig.  4  and  derive  the 
mapping  that  is  optimal  on  the  average  for  all  the 

reflection  coefficients. 

Although  it  is  possible  to  obtain  the  optimal 

transformation  bv  integrating  the  solid  curve  in  Fig.  4 
directly,  we  have  found  it  simpler  and  ultimately  more 
useful  to  approximate  the  averaqod  sensitivity  curve  by  a 
well  specified  mathematical  function  which  could  then  be 
integrated  to  obtain  an  approximately  optimal  f(k).  An 
experimental  fitting  of  the  averaaed  sensitivity  curve  in 
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Fig.  4  has  revealed  that  the  function  l/(l-k2)  approximates 
the  sensitivity  function  reasonably  well  (to  within  a 
multiplicative  constant) ,  as  shown  by  the  dashed  curve  in 
Fig.  4  (Note  that  the  plot  is  given  in  decibels).  Thus, 
from  (25),  the  approximately  optimal  transformation  is  given 
by 


df (k)  _  1 

L  (1-k2 )  ’  (^6) 


Integrating  (26)  we  obtain 

f  (k)  =  YE  log  .  (27) 

As  L  is  arbitrary,  an  interesting  transformation  is  obtained 
by  substitutincr  L=l/2: 


f  (k) 


,  1+k 

lom 


(28) 


From  (16),  the  ratio  of  consecutive  area  coefficients  is 
given  by 


1+k. 

i 

T=K7 


lsifp 


(29) 


Therefore,  the  transformation  in  (28)  is  simply  the 
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logarithm  of  the  area  ratio.-.  Thus,  we  have  shown  that  the 
logarithms  of  the  area  ratios  (henceforth  called  loq  area 
ratios)  provide  an  approximately  optimal  set  of  coefficients 
for  quantization. 

Fig.  5  shows  sensitivity  curves  for  the  log  area  ratios 
using  the  sane  example  as  in  Fig.  3.  A  comparison  of 

Figs.  3  and  5  shows  that  the  sensitivity  curves  are 

relatively  flat  for  the  log  area  ratios.  Our  experimental 
investigations  into  the  quality  of  the  synthesized  speech 
also  indicate  that  the  log  area  ratios  possess  good 
quantization  properties. 

Fig.  6  shows  a  plot  of  the  log  area  ratio  as  a  function 
of  the  reflection  coefficient.  We  have  also  plotted  in 
Fig.  6  a  linear  characteristic  that  passes  through  the 

intersection  of  a  vertical  line  at  k-0.7  and  the  log  area 
ratio  curve.  For  values  of  k  less  than  0.7  in  magnitude, 
the  log  area  ratio  curve  is  almost  linear.  Thus,  if  a 

certain  reflection  coefficient  takes  values  always  less  than 
0.7  in  magnitude,  one  could  Quantize  it  linearly  to  obtain 
approximately  flat  sensitivity  characteristics.  In  practice 
it  is  found  that  the  reflection  coefficients  k. ,  i> 3 ,  have 
in  general  magnitudes  less  than  0.7.  However,  use  of  the 
log  area  ratios  automatically  leads  to  the  di*  aired 

Quantization  irrespective  of  the  reflection  coefficient  and 
the  range  of  values  it  snans. 


10  LOG 
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Fiq.  6.  Log  area  ratio  plotted  as  a  function  of  the 
reflection  coefficient (solid  line)  and  a 
linear  characteristic  that  intersects  it  at 
k=0.7  (dashed  lino). 
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Interpretation  in  terms  of  Pole  Locations 

While  the  spectral  sensitivity  measure  given  by  (20)  is 
useful  in  quantising  the  overall  deviation  in  the  spectrum 
due  to  perturbations  in  the  reflection  coefficients  or  the 
log  area  ratios,  it  does  not,  however,  explain  corresponding 
deviations  in  the  polo  locations  of  the  linear  prediction 
filter.  H,ch  is  Known  about  the  relations  (-tween  the 

accuracy  of  pole  (or  formant)  locations  and  the 

corresponding  effects  on  speech  quality.  Therefore,  it 
would  be  useful  to  examine  the  pole  deviations  due  to 
quantization  of  the  transmission  parameters.  Unfortunately, 
the  problem  is  quite  untractablo  in  general.  However,  some 
insight  can  still  be  gained  bv  examining  a  2-nole  model. 
Although  it  is  possible  to  examine  this  model  in 
mathematical  terms,  here  we  shall  take  a  graphical  approach 
due  to  Kitawaki  and  Itakuz i  [1-1* 

For  the  second  order  linear  predictor,  the  inverse 
filtoi  s  given  bv 


-1  ,  "2 

A (z )  *  1  +  k1(l+k2)  z  +  k2  z 


The  zeros  of  A(z)  are  the  poles  of  our  model  filter  »(,). 
we  shall  restrict  our  discussion  to  the  cases  where  the 
zeros  form  complex  conjugate  pairs.  From  (30)  we  see  that 
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A(z)  has  a  complex  zero  when 


(  (-1,1)  * 


k2  * 


f2-k?-2jl-k* 


,  1 


(31) 


Fiq.  7  shows  a  plot  of  only  the  complex  zeros  an  k,  is 
varied  unifornlv  in  the  interval  (-.99,  .99)  in  equal  steps 
of  .01  while  k,  is  varied  unifbn.lv  in  the  interval  10,  .091 

also  in  equal  stens  of  .01.  Let 


1+k. 


9i  =  log  T=iq 


1  ,  i-1,2. 


(32) 


be  the  loq  area  ratios  correspondinq  to  ^  and  kj .  Fiq.  8 

when  a.  is  varied  over 

depicts  the  complex  zeros  of  A(z)  *hen  l 

l-lon  199.  ion  190,  and  %  over  10,  loo  199,  unifor.lv  and 
in  equal  stens.  The  total  number  of  steps  in  kept  the  same 
as  in  the  previous  case.  Relative  to  Fin.  7,  Fiq. 
that  there  in  denser  clustering  of  the  zeros  near  the  unit 
circle  and  for  ancles  close  to  0  and  i>.  This  means  that 

these  recions,  quantization  errors  in  the  loq  area  ratios 

lead  to  a  smaller  deviation  in  the  position  of  zeros  of  M«> 
than  that  obtained  by  the  quantization  of  the  reflection 
coefficients,  assuninq  the  same  number  of  quantization 
levels  in  both  cases.  Fiq.  9  sh«s  the  complex  roots 


32 


BBN  Report  No.  2800 


Bolt  Beranek  and  Newman  Inc 


pipipii 


vVVi'xVvWv.WW 

»# 


it 
Igf 


Root  loci  for  a  second  order  all-pole  system  obtained  by  varying 
the  two  reflection  coefficients  in  equal  steps.  (After  Kitawaki 
and  Itakura  [12].) 


plex  roots  obtained  by  a  second  order  linear  predictive 
lysis  of  about  30  secs  of  continuous  speech  sampled  at 
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obtained  by  a  second  order  linear  predictive  analysis  of 
several  sentences  of  speech  material  sampled  at  10  kHz.  An 
inspection  of  Figs.  8  and  9  reveals  that  the  roots  of  the 
second  order  linear  predictor  for  the  continuous  speech 
occur  mainly  in  the  areas  where  there  is  a  dense  clustering 
of  zeros  in  Fiq.  8.  We  view  this  as  further  independent 
evidence  supporting  our  earlier  findings  of  the  desirable 
quantization  properties  of  the  log  area  ratios  for  the 
purpose  of  speech  compression. 

Kitawaki  and  Itakura  considered  still  other  nonlinear 
mappings  of  the  reflection  coefficients  but  concluded  that 
the  log  area  ratios  lead  to  the  best  overall  quantization 
accuracy  [12].  Our  results  make  the  stronger  statement  that 
the  log  area  ratios  are  actually  optimal  in  the  sense 
discussed  earlier. 

Optimum  Bit  Allocation 

In  the  following  we  investigate  the  use  of  the  spectral 
sensitivity  measure  for  allocating  a  fixed  number  of  bits 

among  the  various  parameters.  Let  q^  q2»  •  •  •  »  ^  be  the 
parameters  chosen  for  quantization.  These  may  be  the 
reflection  coefficients  or  the  log  area  ratios  or  any  other 
set  of  parameters.  Given  the  total  number  of  bits  for 
quantization  as  M,  the  problem  is  to  distribute  this  among 
the  p  parameters  as  It  ,  l<i<p,  in  some  optimal  manner.  In 
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terms  of  quantization  levels,  the  above  problem  may  bo 
restated  as  the  allocation  of  N  =  2M  levels  amonq  the  p 
parameters  as  t^,  liiiP.  in  some  optimal  manner.  Therefore, 

we  have 

P 


l  M.  =  l  lo<32  Ni  =  M  ' 
i=l  1  i=l 


(33) 


M. 


BN.  -  H,  Ni=2  1  ,  l^P  ‘ 

i=l  1 


We  propose  to  derive 
minimizing  the  maximum 
quantization.  The  total 

chanqes  Aq^  in  the 

approximately  by 


the  optimal  bit  allocation  by 
spectral  deviation  due  to 
spectral  deviation  AS  due  to 
parameters  q^  l^isp»  is  9iven 


AS  - 


(34) 


Define  the  quantization  step  size  for  qi  as 


qi  “  ai  (35) 
6i  =  "IT  ' 

where  q.  and  q.  are  the  upper  and  lower  bounds  on  qt, 
respectively.  1  Then,  for  a  linear  quantization  of  q*  using 
round-off  arithmetic,  the  maximum  quantization  error  is 
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equal  to  half  the  quantization  step  size: 


I  ^<3i  I 

max 


5  6i 


Thus 


(AS) 


max 


P 

Z 

i=l 


3S 

Hi 


(36) 


Let 


Ki 


,  lsisp 


(37) 


Then 


(AS) 


max 


P 

Z 

i*l 


(38) 


We  wish  to  minimize  (AS)  with  respect  to  {N  }  subject  to 

m&x  >*■ 

the  constraint 


p 

Z  log0  N.  =  M  .  (39) 

i-1  2  1 


This  is  a  simple  problem  in  constrained  minimization  [ 13) 
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and  its  solution  is  given  by 


N1  =  K1 


K. 


■»M 


P 


1/P 


(40) 


Ni  =  K7  Ni  '  2<_i.p  . 

The  bit  allocation  strategy  given  in  (40)  is  thus  optimal  in 
a  minimax  sense  since  it  minimizes  the  maximum  spectral 
deviation  due  to  quantization.  Note  that  if  truncation 
arithmetic  is  used,  the  constants  in  (37)  will  be 

doubled,  but  that  will  not  affect  the  bit  allocation  results 
from  ( 40 ) . 

The  optimal  bit  allocation  in  (40)  effectively  says 

that  the  contributions  of  the  different  parameters  to  the 

maximum  spectral  deviation  in  (38)  must  be  equal.  We  know 

3S 

that  for  the  log  area  ratios  the  spectral  sensitivity 


^7 

is  approximately  a  constant  and  is  the  same  for  all  the 
coefficients.  From  (35),  (37)  and  (40),  this  implies  that 
the  quantization  step  size  6^  should  bo  the  same  for  all  the 
log  area  ratios,  which  is  intuitively  clear.  For  this  case, 
the  bit  allocation  can  be  done  as  follows.  Compute  the 
constant  step  size  6  from 


6  = 


P 

1=1 


"  2i> 


1/P 


(41) 
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Then  the  number  of  levels  N^  for  each  coefficient  is 
computed  from  (35).  We  have  found  it  convenient  and  useful 
to  begin  with  a  particular  step  size.  That  automatically 
determines  the  total  number  of  bits  needed,  as  well  as  the 
maximum  spectral  deviation  which,  in  tarn,  determines  the 
resulting  speech  quality.  One  can  then  study  the  change  in 
speech  quality  as  a  function  of  only  one  variable,  namely 
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VII.  COMMENT'S  ON  ANOTHER  SPECTRAL  SENSITIVITY  MEASURE 

In  Section  V  we  introduced  a  spectral  sensitivity 
measure  to  study  the  quantization  properties  of  the 
reflection  coefficients.  Other  types  of  sensitivity 
measures  may  also  be  used.  In  particular  we  have  considered 
a  measure  which  is  similar  to  the  total-squared  error  used 
for  minimization  in  linear  predictive  analysis.  By  using 
Parseval 1 s  theorem  in  (1)#  the  total-squared  error  is  given 
by 


E 


2  tt  P  (gj) 

VJ _  y  0 

2  tt  pTwT“ 


dw 


(42) 


where  P  (w)  is  the  power  snectrum  of  the  input  speech  signal 
o 

and  P(oi)  is  the  power  spectrum  of  the  all-pole  filter: 


P(w) 


H(e:w) 


A(e^) 


(43) 


The  gain  G  is  given  by  (9). 

Properties  of  the  error  measure  E  have  been  studied  in 
detail  elsewhere  [6,7,14].  In  particular,  the  minimization 
of  E  results  in  an  all-pole  model  spectrum  P  (u> )  that  is  a 
good  approximation  to  the  envelope  of  the  signal  spectrum 
Pq(u)).  Because  of  this  property,  it  seemed  reasonable  to 
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study  the  use  of  this  error  E  as  a  measure  of  the  deviation 
between  the  two  spectra.  For  the  sake  of  normalization  we 
have  chosen  to  work  with  an  error  measure  E'  obtained  from 

2 

(42)  by  eliminating  the  factor  G  : 


E' 


P1(w) 

pjw 


dco 


(44) 


where  (w)  and  P2  (w)  are  now  any  two  spectra.  Also,  the 
two  spectra  are  normalized  such  that  they  have  equal  total 

enerqv. 

For  our  study  of  spectral  sensitivity  we  let 
P  (aj)=P(k.  ,u)  and  P  (w)=P  (k.  +Ak  ,u)  ,  where  P(-,u)  is  given 
by  (43).  The  error  between  the  two  spectra  is  then  given  by 


l  tt  P  (k^,w) 

E'  ( Ak^)  =  2tt  “PlkT+iOTTw) 

We  define  the  spectral  deviation,  then,  as 
AS'  =  log  E' (Aki)  . 


dco  • 


(45) 


(46) 


The  definition  of  the  new  -.leasure  of  spectral  sensitivity 
follows  from  (46)  and  (45)  as 


Lim  1 
Ak^-*-0  Ak^ 


P  (ki ,  cu) 

pTki+Aki,coT 


dco 


(47) 
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The  spectral  sensitivity 
analytically,  without  the 
data  as  was  the  case  for  the 
done  be low . 


in 


(47) 


can  be  derived 


need  to  resort  to  experimental 

ss 

study  of  in  (20).  This  is 


E'  (Ak^)  in  (45)  can  be  interpreted  as  the  arithmetic 
mean  of  the  ratio  of  the  tv/o  spectra.  For  small  Ak^,  the 
arithmetic  mean  is  approximately  equal  to  the  geometric 
mean,  which  is  given  b' 


E"  (Ak^)  =  exp 


1_ 

2n 


TT 

I 

-ir 


log 


P  (ki  ,w) 

P  (k^+Ak^  ,  hj) 


dui 


(48) 


As  Ak^-»0 ,  the  arithmetic  mean  becomes  equal  to  the  geome  tric 
mean.  Using  this  result,  we  have  from  (45),  (47)  and  (48), 


3S' 

w: 

1 


Lim  1 
Ak .  -*-0  Ak. 

l  i 


P(ki,u>) 

log  P  (k.+Ak.  , to ) 

l  i' 


(49) 


Substituting  (9)  and  (43)  in  (49),  there  results 


3S '  _  Lim 

W7  ~  Ak.-O 
i  l 


log 


A(ki+Aki,e^u)) 
A(ki;  e^w) 


2 


(50) 
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It  can  he  shown  l 7 ]  that  if  the  zeros  of  A(z)  lie  inside  the 
unit  circle,  then 


/  log  |  A  (e^w)  |  2  dw  -  0 


(51) 


-if 


Substituting  (51)  in  (50),  and  noting  that  Vp  is  independent 
of  u,  we  obtain 


as-  .  Li*  1°9  v„(y  -  WAki> 
ak.  "^i~ 


(52) 


or 


as-  .  .  =^9  V*j>l 


air 


3ki 


Using  (8)  in  (52)  we  obtain  the  desired  result 


3S' 

3k. 


2ki 

1-k? 


(53) 


It  is  important  to  note  that  this  is  an  exact  result  and  it 
is  true  for  each  reflection  coefficient,  independent  of  the 

as 1 

values  of  the  other  coefficients.  Also,  a  plot  of 
versus  k  gives  a  U— shaped  curve.  Therefore,  the  spectral 
sensitivity  in  (53)  has  the  same  general  properties  as  the 
spectral  sensitivity  ||  obtained  experimentally  in  Section 
V.  The  only  difference  between  the  two  is  the  actual  shape 
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of  the  sensitivity  curve. 

Substitutinq  (53)  in  the  optimality  condition  (25)  and 
inteqrating  it  with  L-l,  we  obtain  the  following  optimal 
mappinq  for  the  sensitivity  measure  (47): 

f  •  (k)  =  log  — j  •  (54) 

1-k* 


From  (8)  and  (54),  it  is  interesting  to  observe  that  f'O^) 
is  equal  to  the  logarithm  of  the  ratio  of  the  normalized 
errors  (or  log  error  ratio)  associated  with  the  linear 
predictors  of  orders  i-1  and  i. 


V  (k.) 


(55) 


We  have  experimentally  investigated  the  quantization 
properties  resulting  from  the  mappings  given  bv  (28)  and 
(55).  Through  informal  listening  tests  we  have  found  that 
the  use  of  the  log  area  ratios  for  quantization  leads  to 
uniformlv  better  speech  quality  than  that  obtained  using  the 
log  error  ratios. 

It  is  interesting  to  note  that  the  only  difference 
between  the  two  sensitivity  measures  given  by  (20)  and  (49) 
is  the  lack  of  an  absolute  value  sign  inside  the  integral  in 
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(49).  This  makes  the  sensitivity  measure  in  (49)  less 
powerful,  because  spectral  deviations  when 
P(k^,w)>P(k^+Ak^,w)  can  cancel  deviations  when 
P (k^+Ak^ ,w) >P (k^ ,w) .  Both  of  these  cases  contribute  to  the 
total  spectral  deviation  in  (20).  This  is  another  reason 
why  (20)  is  to  be  preferred  over  (49)  as  a  definition  of 
spectral  sensitivity,  and  therefore  why  the  loq  area  ratios 
are  to  be  preferred  over  the  log  error  ratios  as 
transmission  parameters.  (See  [14]  for  further  comparison 
of  the  spectral  deviations  in  (20)  and  (47).) 
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VIII.  CONCLUSIONS 


We  have  dealt  with  the  problem  of  quantization  of 
transmission  parameters  in  linear  predictive  speech 
compression  systems.  Several  alternate  sets  of  transmission 
parameters  were  considered  and  their  relative  quantization 
properties  were  presented.  The  results  of  this  study  have 
shown  that  the  reflection  coefficients  are  the  best  set  for 
use  as  transmission  parameters.  Specif ically ,  the 
reflection  coefficients  preserve  the  stability  of  the  linear 
predictor  under  quantization,  and  possess  a  natural  ordering 
which  property  can  be  used  in  the  desigi.  of  better 
quantization  schemes.  The  quantization  of  the  reflection 
coefficients  was  then  examined  in  more  detail  using  a 
spectral  sensitivity  measure. 

The  spectral  sensitivity  of  a  given  reflection 
coefficient  was  defined  in  terms  of  the  absolute  spectral 
deviation  due  to  a  small  perturbation  in  the  reflection 
coefficient.  Experimental  study  of  this  spectral 
sensitivity  measure  has  shown  that  a  reflection  coefficient 
has  a  high  sensitivity  for  magnitudes  close  to  1  and  a  low 
sensitivity  near  0.  Further,  all  the  reflection 
coefficients  have  approximately  the  same  sensitivity 
behavior,  irrespective  of  the  particular  speech  sound  to 
which  they  correspond.  A  prototype  sensitivity  function  was 
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obtained  experimentally  by  averaging  the  sensitivity  values 
over  the  various  reflection  coefficients  and  over  a  large 
number  of  speech  sounds.  We  have  then  developed  an  optimal 
quantization  procedure  for  the  reflection  coefficients. 
This  consisted  of  finding  a  suitable  mapping  that  transforms 
the  reflection  coefficients  to  a  set  of  parameters  having  a 
flat  or  constant  sensitivity  behavior.  Using  an  analytical 
function  that  well  approximates  the  averaged  sensitivity  of 
the  reflection  coefficients,  we  demonstrated  that  the 
logarithms  of  the  ratios  of  area  coefficients  (or  log  area 
ratios)  possess  approximately  optimal  quantization 

properties . 

An  optimal  solution  was  then  derived  for  the  problem  of 
bit  allocation  among  the  different  parameters.  This  was 
done  by  minimizing  the  maximum  spectral  deviation  due  to 
quantization.  For  the  log  area  ratios,  this  optimal 
solution  reduces  to  using  equal  quantization  steps  for  all 
the  parameters. 

Finally,  motivated  to  use  an  error  measure  similar  to 
the  one  used  in  linear  predictive  analysis,  we  have  provided 
an  alternate  definition  of  spectral  sensitivity.  An 
analytical  evaluation  of  this  spectral  sensitivity  for  the 
reflection  coefficients  has  shown  that  the  logarithms  of  the 
ratios  of  normalized  errors  of  linear  predictors  of 
successive  orders  (on  log  error  ratios)  exhibit  optimal 
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quantization  properties.  However,  informal  listening,  tests 
have  indicated  that  the  use  of  log  area  ratios  for 
quantization  leads  to  better  synthesis  than  the  use  of  1c 
error  ratios.  This  further  implies  that  the  definition  of 
spectral  sensitivity  that  resulted  in  the  log  area  ratios 
gives  a  superior  measure  of  spectral  sensitivity  tor  the 
purpose  of  quantization  studies. 
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